Thresher: determining the number of clusters while removing outliers

نویسندگان

چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatically Determining Number of Clusters

Automatically determining number of clusters in the data is an unsolved/unexplored problem. First I'll show why we need to do this, and whether this is a reasonable problem in text clustering in particular. Then starting from simple 1-d/2-d study, I nd BIC (Bayesian Information Criterion) is a useful measure, which by penalizing model tness by model complexity, usually tells the right number of...

متن کامل

Removing Outliers in Illumination Estimation

A method of outlier detection is proposed as a way of improving illumination-estimation performance in general, and for scenes with multiple sources of illumination in particular. Based on random sample consensus (RANSAC), the proposed method (i) makes estimates of the illumination chromaticity from multiple, randomly sampled sub-images of the input image; (ii) fits a model to the estimates; (i...

متن کامل

Probabilistic estimation while ignoring outliers

Logistic regression learns a parameterized mapping from feature vectors to probability vectors and is for example central to estimating click rates for ads on web pages. The parameter is found by minimizing the logistic loss. However minimizing any convex loss summed over a set of examples is prone to outliers. We define a versatile method for designing non-convex losses that ameliorate the eff...

متن کامل

Determining the Number of Trace Clusters: a Stability-based Approach

Given the complexity of real-life event logs, several trace clustering techniques have been proposed to partition an event log into subsets with a lower degree of variation. In general, these techniques assume that the number of clusters is known in advance. However, this will rarely be the case in practice. Therefore, this paper is the first to present an approach to determine the appropriate ...

متن کامل

Determining the Number of Clusters via Iterative Consensus Clustering

We use a cluster ensemble to determine the number of clusters, k, in a group of data. A consensus similarity matrix is formed from the ensemble using multiple algorithms and several values for k. A random walk is induced on the graph defined by the consensus matrix and the eigenvalues of the associated transition probability matrix are used to determine the number of clusters. For noisy or high...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: BMC Bioinformatics

سال: 2018

ISSN: 1471-2105

DOI: 10.1186/s12859-017-1998-9